Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions

نویسندگان

Dorothea Kolossa

Ramón Fernández Astudillo

Eugen Hoffmann

Reinhold Orglmeister

چکیده

When a number of speakers are simultaneously active, for example in meetings or noisy public places, the sources of interest need to be separated from interfering speakers and from each other in order to be robustly recognized. Independent component analysis (ICA) has proven a valuable tool for this purpose. However, ICA outputs can still contain strong residual components of the interfering speakers whenever noise or reverberation is high. In such cases, nonlinear postprocessing can be applied to the ICA outputs, for the purpose of reducing remaining interferences. In order to improve robustness to the artefacts and loss of information caused by this process, recognition can be greatly enhanced by considering the processed speech feature vector as a random variable with time-varying uncertainty, rather than as deterministic. The aim of this paper is to show the potential to improve recognition of multiple overlapping speech signals through nonlinear postprocessing together with uncertainty-based decoding techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multitalker speech perception with ideal time-frequency segregation: effects of voice characteristics and number of talkers.

When a target voice is masked by an increasingly similar masker voice, increases in energetic masking are likely to occur due to increased spectro-temporal overlap in the competing speech waveforms. However, the impact of this increase may be obscured by informational masking effects related to the increased confusability of the target and masking utterances. In this study, the effects of targe...

متن کامل

Modeling the perception of multitalker speech

Listeners’ ability to understand a target speaker in the presence of one or more simultaneous competing speakers is subject to two types of masking: Energetic and informational. Energetic masking occurs when target and interfering signals overlap in time and frequency resulting in portions of target becoming inaudible. Informational masking occurs when the listener is unable to segregate the ta...

متن کامل

Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation.

When a target speech signal is obscured by an interfering speech wave form, comprehension of the target message depends both on the successful detection of the energy from the target speech wave form and on the successful extraction and recognition of the spectro-temporal energy pattern of the target out of a background of acoustically similar masker sounds. This study attempted to isolate the ...

متن کامل

A model for multitalker speech perception.

A listener's ability to understand a target speaker in the presence of one or more simultaneous competing speakers is subject to two types of masking: energetic and informational. Energetic masking takes place when target and interfering signals overlap in time and frequency resulting in portions of target becoming inaudible. Informational masking occurs when the listener is unable to distingui...

متن کامل

Modeling the Perception Of

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

EURASIP J. Audio, Speech and Music Processing

دوره 2010 شماره

صفحات -

تاریخ انتشار 2010

Independent Component Analysis and Time-Frequency Masking for Speech Recognition in Multitalker Conditions

نویسندگان

چکیده

منابع مشابه

Multitalker speech perception with ideal time-frequency segregation: effects of voice characteristics and number of talkers.

Modeling the perception of multitalker speech

Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation.

A model for multitalker speech perception.

Modeling the Perception Of

عنوان ژورنال:

اشتراک گذاری